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Abstract 

An approach to the acceleration of parametric weak classifier boost- 
ing is proposed. Weak classifier is called parametric if it has fixed 
number of parameters and, so, can be represented as a point into 
multidimensional space. Genetic algorithm is used instead of ex- 
haustive search to learn parameters of such classifier. Proposed 
approach also takes cases when effective algorithm for learning 
some of the classifier parameters exists into account. Experiments 
confirm that such an approach can dramatically decrease classifier 
training time while keeping both training and test errors small. 

Keywords: boosting, genetic algorithm, classification, haar fea- 
ture 

1 Introduction 

Boosting is one of the commonly used classifier learning ap- 
proaches. It is machine learning meta-algorithm that iteratively 
learns additive model consisting of weighed weak classifiers that 
belong to some classifier family W . In case of two-class classifica- 
tion problem (which we will consider in this paper) boosted classi- 
fier usually has form 

s{y) = sgn a^Wi{y)^ . (1) 

There t/ G y is a sample to classify, Wi £ W are weak classifiers 
learned during boosting procedure, Qi are weak classifier weights, 
^iiu) £ {~f) 1}' •s(y) £ {^li !}• Set W is referred to as weak 
classifier family. That is because it elements should have error rate 
only slightly better than random guessing. It expresses the key idea 
of boosting: strong classifier can be built on top of many weak. 

There are many boosting procedures that differ by the type of loss 
being optimized for the final classifier. But no matter what kind of 
boosting procedure is used, on each iteration it should select (learn) 
a weak classifier with minimal weighed loss from W family using 
special algorithm called weak learner. Fast and accurate optimiza- 
tion methods are often not applicable there (especially in the case 
of discrete classifier parameters), so exhaustive search over weak 
classifier parameter space is used as a weak learner. Unfortunately, 
exhaustive search can take a lot of time. For example, learning cas- 
cade of boosted classifiers based on haar features with AdaBoost 
and exhaustive search over classifier parameter space took several 
weeks in the famous work [Viola and Jones 2001]. That's why it is 
often very important to decrease weak classifier learning time using 
some appropriate numerical optimization approach. 

One of the widely used approaches to the numerical optimization is 
genetic algorithm [Goldberg 1989]. It is based on biological evo- 
lution ideas. Optimization problem solution is coded as chromo- 
some vector. Initial population of solutions is created using random 
number generator. Fitness function is then used to assign fitness 
value to every population member. Solutions with the biggest fit- 
ness values are selected for the next step. In the next step, genetic 
operators (crossover and mutation usually) are applied to selected 



chromosomes to produce new solutions and to modify existing ones 
slightly. That modified solutions form up a new generation. Then 
described process repeats. That's how evolution is modeled. It con- 
tinues until global or suboptimal solution is found or time allowed 
for evolution is over. Genetic algorithms are often used for global 
extremum search in big and complicated search spaces. It makes 
genetic algorithm good candidate for weak classifier learner. 

2 Related work 

Usage of genetic algorithm for weak learner acceleration was al- 
ready proposed in several works. For example, in [Treptow and 
Zell 2004] genetic weak learner with special crossover and muta- 
tion operators was used to learn classifier based on extended haar 
feature set. In [Ramirez 2007] genetic algorithm was used to select 
a few thousand weak classifiers with smallest error on unweighed 
training set before boosting process starts. Then exhaustive search 
over selected classifiers was performed on each boosting iteration 
to select the one with minimal weighed loss. In [Masada et al. 
2008] boosting procedure was completly integrated with genetic 
algorithm. Few classifiers were selected on each boosting iteration 
from solution population and added to the strong classifier. That se- 
lected classifiers were then used to produce new population mem- 
bers by applying genetic operators. Then, in [Abramson et al. 2006] 
authors used for weak learner some special evolutionary algorithm 
they've called Evolutionary Hill-Climbing. Crossover operator was 
not used in it. Instead, 5 different mutations were applied to every 
population member on each algorithm iteration. Result of each mu- 
tation was rejected when it did not improve fitness function value. 

There were two main reasons for using genetic search instead of 
any other approaches in these works. Most of the classifiers used in 
mentioned works were some extensions of the haar classifier family 
originally proposed in [Viola and Jones 2001]. So, huge size of 
a weak classifier family do not allow to apply exhaustive search 
based optimization. And complicated discrete structure of a weak 
classifier blocks all other optimization options. 

Another important observation is the fact that every time work au- 
thors were forced to implement some specialized solution for ge- 
netic weak learner. So, ability to generalize evolutionary approach 
to learning weak classifier is investigated in this work. 

3 Proposed method 

We are interested in developing some general approach to learning 
weak classifier. This approach should work much more faster than 
exhaustive search over classifier parameter space. In the follow- 
ing document sections one such approach is presented. It is based 
on the fact that when number of classifier parameters to optimize is 
fixed, weighed loss optimization problem simply turns out into mul- 
tivariate function minimization problem which is well-developed 
area of genetic algorithm application. 



3.1 Population member 

Let W be some parametric family of weak classifiers. It means that 
every weak w € W can be described by set of it's n real-valued 
parameters xi,..., a;„. Let's also assume that for last I parameters 
(/ can be equal to zero) there exists some effective learning algo- 
rithm Le ■ R"~' —>■ K'. We will refer to such parameters as to 
linked. For given values of parameters xi, . . . , x^-i, called /ree, 
Le finds optimal values for linked parameters that minimize loss 
function E : R+. It means that our task is to find val- 

ues of free parameters that deliver the minimum to the loss func- 
tion E[xi, . . . ,x„-i,Le{xi, . . . ,x„-i)]. So, set of parameters 
xi, . . . , Xn-i represents solution to our optimization problem and 
form up a member of genetic algorithm population. 

3.2 Fitness function 

It is natural to assume that classifier with small error on training 
set should have greater probability to get to the next generation of 
genetic algorithm. That allows us to introduce fitness function F : 
R"-' _> R+ as follows: 

F(a;i,...,a;„_i) = 

= l/E[xi, . . . ,X„-l, Le{xi, . . . ,Xn-l)]- (2) 

We do not consider E = case. Classifier can not be called weak if 
it has zero error value on training set. If such a classifier is presented 
in a weak classifier family, we can select only that classifier as a 
whole boosting procedure result. 

3.3 Genetic representation 

Every approach that allows us to code a set of free parameters is 
appropriate for population member representation. In this work we 
have selected binary string representation which was confirmed to 
be effective in function optimization problems. Some alternative 
representations can be found, for example, in [Goldberg 1989]. 

To form the binary string classifier representation, each classifier 
parameter should be first represented as a binary string of fixed 
length, using fixed-precision encoding. Then all the parameters 
can be simply concatenated to form the final binary string of fixed 
length. 

Sometimes point p € R" can have no corresponding classifier. For 
the different families of image region classifiers it is possible, for 
example, when one of the free parameters representing top-left cor- 
ner of a classifier window is below zero. In this case fitness func- 
tion value for the population member representing that point can 
be forced to be zero. That is how such situations were dealt with 
in experiments described in section 4. Another possible approach 
is to select representation and genetic operators in a way that sim- 
ply does not allow such points to appear. But that approach is less 
general. 

3.4 Genetic operators 

In this work we've used two most common genetic operators: mu- 
tation and crossover. For binary string representation mutation and 
crossover are usually defined as follows: 

• Crossover operator selects random position in the binary 
string. Then it swaps all the bits to the right of the selected 
position between two chromosomes. Such crossover imple- 
mentation is called 1 -point crossover. 



• Mutation operator changes value of the random chromosome 

bit to the opposite. 

In our case, crossover operator produces two new solutions from 
the two given chromosomes as following: some of the parameters 
(placed to the left of the selected position) are taken from the first 
classifier, some of the parameters (placed to the right) — from the 
second. And one parameter, probably, can be made from both the 
the first and the second classifier. Mutation operator simply pro- 
duces new solution by changing value of the random classifier pa- 
rameter. 

3.5 Algorithm summary 



Algorithm 1 Genetic weak learner 
1: Generate initial population of N random binary strings; 
2: for i = 1, . . . , Kmax do 

3: Add \NRc] members to the population by applying 
crossover operator to the pairs of the best population mem- 
bers; 

4: Apply mutation operator to \NRm\ random population 

members; 

5: Calculate value of (2) for each population member; 

6: Remove all the population members except of the A'^ best 

(the ones with highest value of (2)); 
7: end for 

8: return weak classifier associated with point represented by 
best population member as a result; 



Algorithm 1 uses elitism as a population member selection ap- 
proach. It has 4 parameters: 

• N > — population size. 

• Kmax > — number of generations. 

• J?c € (0, 1] — crossover rate. 

• Rm € (0, 1] — mutation rate. 

3.6 Discussion 

Advantage of the proposed method lies in the fact that computa- 
tional complexity of the weak learner does not depend on the size 
of the weak classifier family. One can achieve balance between 
training time and classifier performance only by changing values of 
N, Kmax and S (discussed later). Similar effect can be achieved 
by shrinking weak classifier family itself. But in most cases prior 
knowledge about weak classifier performance in boosting is simply 
not available. 

One of the main disadvantages of the proposed weak learner is the 
fact that many potentially interesting weak classifiers can not be 
represented as a parameter vector of constant length. For example, 
decision trees, widely used in boosting, can have variable number 
of nodes. Misclassification loss we want to optimize should also 
be more or less stable as a function of classifier free parameters. 
If small perturbations of the free parameter vector lead to the un- 
predictable changes in the loss function value, genetic optimization 
does not make much sense, becoming just a random search. But, 
unfortunately, that situation happens quite often, especially if clas- 
sifier parameter count is small. Coimnon example is a situation 
when one of the free parameters represents feature number and fea- 
tures with close numbers are not correlated at all. 



4 Experiments 

4.1 Algorithms for experiments 

Two boosting-based algorithms were implemented to compare pro- 
posed genetic weak learner with original learners proposed by algo- 
rithm authors. Viola- Jones [Viola and Jones 2001] and Face align- 
ment via boosted ranking model [Wu et al. 2008] were selected for 
that purpose because both algorithms use parametric weak clas- 
sifiers applied to image regions. These algorithms are based on 
distinct boosting procedures {AdaBoost and GentleBoosi), so loss, 
sample weight and classifier weight functions used in them differ 
a lot. Another difference between selected algorithms is a problem 
they solve: two-class classification in [Viola and Jones 2001] and 
ranking in [Wu et al. 2008]. Training time of the naive implemen- 
tation is quite long for both algorithms, so acceleration of boosting 
process is necessary. 

Weak classifiers used in both algorithms are based on haar 
features and have common set of adjustable parameters. So, 
weak classifier in both problems can be represented as Wi = 

{xi,yi, widthi, heighti.typei, gi.ti). There Xi, iji, widthi and 
heighti describe image region, typci encodes haar feature type, gi 
is a haar feature sign and ti represents weak classifier threshold. 
Parameters gi and ti are linked because both algorithms have an ef- 
fective algorithm for learning them. Parameter typei was also made 
linked: changing feature type during genetic optimization does not 
make much sense because it can change fitness function value sig- 
nificantly after just one mutation or crossover. Separate algorithm 
run was performed instead for each feature type. Best result from 
all the runs was then selected. We've used the same 5 haar feature 
types as in [Wu et al. 2008] for training both classifiers. 

4.2 Run patterns 

Comparison of two different genetic algorithm run patterns was 
also performed in this work. One pattern considered was running 
genetic optimization once with big population size. Another pattern 
used was nmning optimization algorithm multiple times (denoted 
as S) with small population size and then selecting best foimd clas- 
sifier. When population size is small, final solution depends on 
initial population a lot. So, considerably different results can be 
obtained for different algorithm runs. While this run pattern pro- 
duces worse classifiers, it can be implemented on multiprocessor 
and multicore architectures very efficiently: each processing unit 
can run it's own genetic simulation. That makes perfect parallel 
algorithm acceleration possible. 

4.3 Training and test sets 

As in work [Treptow and Zell 2004], [Carbonetto 2002] human 
faces database was used to train and test classifier for Viola- Jones 
algorithm. Database was divided in half to form the training and 
test sets. Each sample has size of 24 x 24 pixels. 

Face images with landmarks from FG-NET aging database were 
used to form the database for learning face alignment ranker pro- 
posed in [Wu et al. 2008]. 600 face images were selected from 
database and then resized to size of 40 x 40 pixels. 400 images were 
used to produce training set and other 200 — for testing. 10 se- 
quential 6-step random landmark position perturbations were then 
applied to selected face images to produce images of misaligned 
faces, as described in original paper. Training and test set samples 
were then made of pairs of images with increasing alignment qual- 
ity. 



Table 1: Viola-Jones, acceleration 





Run pattern 


Time (sec) 


Acceleration 


s 


N 








1 


50 


10 


2.82 


329.38 


1 


100 


20 


9.40 


98.77 


1 


400 


40 


100.29 


9.26 


10 


10 


20 


4.00 


231.94 


20 


20 


40 


28.74 


32.31 




Brute force 


928.52 


1.00 


Table 2: Viola-Jones, error 




Run pattern 


Error 




S N 




Learning 


Test 




1 50 


10 


0.0005 


0.0356 




1 100 


20 


0.0002 


0.0380 




1 400 


40 


0.0000 


0.0328 




10 10 


20 


0.0003 


0.0378 




20 20 


40 


0.0000 


0.0391 




Brute force 


0.0000 


0.0349 



4.4 Hardware 

All the experiments were performed on PC equipped with 2.33 GHz 
Intel Core 2 Quad processor and 2 GB of DDR2 RAM. 

4.5 Results 

Tables 1 and 3 show average duration of 1 boosting iteration to- 
gether with comparison to exhaustive search. Tables 2 and 4 show 
error rate of the final classifiers on the training and test sets. We 
have not trained any classifier using exhaustive search for boosted 
ranking model because it would take about a year to finish the pro- 
cess on our training set. 

Experiments with Viola-Jones object detector showed that classi- 
fier trained using genetic weak learner performs only slightly worse 
than classifier trained using exhaustive search over classifier space. 
For A'' = 400 final classifier even shows better performance. Clas- 
sifier trained with S* = 1, Af = 50 and Kmax ~ 10 acceler- 
ates boosting nearly 300 compared to exhaustive search times while 
still performing good on test set. Classifiers trained with small A'^ 
and big S values (using second nm pattern) perform worse than 
any other. But, as it was mentioned before, such classifiers can be 
trained on multiprocessor or multicore systems very efficiently. 

Experiments with face aligimient via boosted ranking model 
showed how exactly classifier performance depends on values of 

S, N and Kmax- Increasing value of the each parameter results 
in increased training time, but also in increased classifier perfor- 
mance. Nevertheless, difference in training time is much more sig- 
nificant compared to the difference in prediction error. Classifier 
with S = 1, N = 25 Kmax = 10 was trained 50 times faster 
than the best obtained classifier for BRM, but it's error is only 1.2 
times worse. It makes such a classifier a perfect candidate for pre- 
liminary experiments that usually take place before training final 
classifier starts. 

5 Conclusion 

An approach to boosting procedure acceleration was proposed in 
this work. Approach is based on usage of special genetic weak 
learner for learning weak classifier on each boosting iteration. Ge- 
netic weak learner uses genetic algorithm with binary chromo- 



Table 3: Face alignment via BRM, acceleration 





Run pattern 


Time (sec) 


Acceleration 


s 








1 


25 


10 


68.15 


5195.88 


1 


50 


10 


173.33 


2043.09 


2 


75 


15 


909.55 


389.34 


4 


100 


20 


3582.37 


98.85 




Table 4: 


Face a. 


Jignment via BRM, error 




Run mode 


lirror 




S N 


Kmax Learning 


Test 




1 25 


10 


0.0278 


0.0317 




1 50 


10 


0.0246 


0.0297 




2 75 


15 


0.0199 


0.0268 




4 100 


20 


0.0173 


0.0259 



somes. That genetic algorithm is designed to solve an optimization 
problem of selecting weak classifier with the smallest weighed loss 
from some parametric classifier family. Proposed method was gen- 
eralized for the case when there exists an effective algorithm for 
learning some of the parameters of a weak classifier. Experiments 
have shown that such approach allows us to accelerate training pro- 
cess dramatically for practical tasks while keeping prediction error 
small. 

Genetic weak learner proposed in this work can't be used to boost 
any tree-based classifiers. That fact limits its usage in many scenar- 
ios because stump weak classifiers can not represent any relation- 
ships between different object features. So, in the future work we 
plan to generalize our approach for accelerating tree-based boost- 
ing. 

Another option for future research is performing additional experi- 
ments with classifiers not related to haar features in any way. That 
will confirm proposed algorithm's profit in computer vision prob- 
lems not biased towards haar feature usage. In fact, it would be 
nice to determine different parametric classifier families that can be 
efficiently boosted using proposed weak learner. 
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